Combining synthetic data with subsampling to create public use microdata files for large scale surveys
نویسنده
چکیده
To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants’ confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemination. We propose to create disclosure-protected subsamples from large scale surveys based on multiple imputation. The idea is to replace identifying or sensitive values in the original sample with draws from statistical models, and release subsamples of the disclosure-protected data. We present methods for making inferences with the multiple synthetic subsamples.
منابع مشابه
Combining Methods to Create Synthetic Microdata: Quantile Regression, Hot Deck, and Rank Swapping
Government agencies must simultaneously disseminate useful microdata and maintain confidentiality of individual records. Releasing synthetic data is one approach. We propose to create synthetic data using a combination of quantile regression, hot deck imputation, and rank swapping. The result is a releasable data set containing original values for a few key variables, synthetic quantile regress...
متن کاملMasking and Re-identification Methods for Public-Use Microdata: Overview and Research Problems
This paper provides an overview of methods of masking microdata so that the data can be placed in public-use files. It divides the methods according to whether they have been demonstrated to provide analytic properties or not. For those methods that have been shown to provide one or two sets of analytic properties in the masked data, we indicate where the data may have limitations for most anal...
متن کاملSeeking explanation in theory: Reflections on the social practices of organizations that distribute public use microdata files for research purposes
(2001). Seeking explanation in theory: Reflections on the social practices of organizations that distribute public use microdata files for research purposes. Public concern about personal privacy has recently fo-cused on issues of Internet data security and personal information as big business. The scientific discourse about information privacy focuses on the crosspres-sures of maintaining conf...
متن کاملSYNTHETIC DATA FOR SMALL AREA ESTIMATION IN THE AMERICAN COMMUNITY SURVEY by
Small area estimates provide a critical source of information used to study local populations. Statistical agencies regularly collect data from small areas but are prevented from releasing detailed geographical identifiers in public-use data sets due to disclosure concerns. Alternative data dissemination methods used in practice include releasing summary/aggregate tables, suppressing detailed g...
متن کاملAssessing the Statistical Disclosure Risk of a Demographic Microdata File
There are two recent developments related to survey data dissemination that may be increasing the risk of disclosure of respondent data. One is that statistical agencies are now releasing more microdata files than previously, partly in response to the urging of researchers needing the data for precise analytic work. For example, some data rich files with possibly high disclosure risk, that have...
متن کامل